Faster Longest Common Extension Queries in Strings over General Alphabets

نویسندگان

  • Pawel Gawrychowski
  • Tomasz Kociumaka
  • Wojciech Rytter
  • Tomasz Walen
چکیده

Longest common extension queries (often called longest common prefix queries) constitute a fundamental building block in multiple string algorithms, for example computing runs and approximate pattern matching. We show that a sequence of q LCE queries for a string of size n over a general ordered alphabet can be realized in O(q log log n + n log n) time making only O(q + n) symbol comparisons. Consequently, all runs in a string over a general ordered alphabet can be computed in O(n log log n) time making O(n) symbol comparisons. Our results improve upon a solution by Kosolobov (Information Processing Letters, 2016), who gave an algorithm with O(n log n) running time and conjectured that O(n) time is possible. We make a significant progress towards resolving this conjecture. Our techniques extend to the case of general unordered alphabets, when the time increases to O(q log n+ n log n). The main tools are difference covers and a variant of the disjoint-sets data structure by La Poutré (SODA 1990).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties

Strings with don’t care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still...

متن کامل

Development of Cache Oblivious Based Fast Multiple Longest Common Subsequence Technique(CMLCS) for Biological Sequences Prediction

A biological sequence is a single, continuous molecule of nucleic acid or protein. Classical methods for the Multiple Longest Common Subsequence problem (MLCS) problem are based on dynamic programming. The Multiple Longest Common Subsequence problem (MLCS) is used to find the longest subsequence shared between two or more strings. For over 30 years, significant efforts have been made to find ef...

متن کامل

Two Algorithms for LCS Consecutive Suffix Alignment

The problem of aligning two sequences A and B to determine their similarity is one of the fundamental problems in pattern matching. A challenging, basic variation of the sequence similarity problem is the incremental string comparison problem, denoted Consecutive Suffix Alignment, which is, given two strings A and B, to compute the alignment solution of each suffix of A versus B. Here, we prese...

متن کامل

A Faster Longest Common Extension Algorithm on Compressed Strings and its Applications

In this talk, we introduce our recent data structure for longest common extension (LCE) queries on grammar-compressed strings. Our preprocessing input is a straight-line program (SLP) of size n describing a string w of length N , which is essentially a CFG in the Chomsky normal form generating only w. We can preprocess the input SLP in O(n log log n logN log∗ N) time so that later, given two va...

متن کامل

Efficient Dominant Point Algorithms for the Multiple Longest Common Subsequence (MLCS) Problem

Finding the longest common subsequence of multiple strings is a classical computer science problem and has many applications in the areas of bioinformatics and computational genomics. In this paper, we present a new sequential algorithm for the general case of MLCS problem, and its parallel realization. The algorithm is based on the dominant point approach and employs a fast divide-and-conquer ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016